An Integrated OCR Software for Mathematical Documents and Its Output with Accessibility

نویسندگان

  • Masakazu Suzuki
  • Toshihiro Kanahori
  • Nobuyuki Ohtake
  • Katsuhito Yamaguchi
چکیده

This paper describes shortly a practical integrated system for scientific documents including mathematical formulae, named ‘Infty’. The system consists of three components of applications: an OCR system named ‘InftyReader’, an editor named ‘InftyEditor’ and converting tools into various formats. Those applications are linked each other via XML files. InftyReader recognizes scanned images of clearly printed mathematical documents and outputs the recognition results in a XML format. It recognizes complex mathematical formulae used in various research papers of mathematics including matrices. InftyEditor provides a very efficient interface to correct the recognition results using keyboard. Another feature of InftyEditor is its handwriting interface to input mathematical formulae for users with vision and speech interface for visually disabled uses. The XML files output by InftyReader/Editor can be converted into various formats: LTEX, MathML, HTML and Braille Codes; in UBC (Unified Braille Codes) for English texts and in Japanese Braille Codes for

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Survey of Math Accessibility For Blind Persons and An Investigation on Text/Math Separation

Despite recent advances, blind students, researchers, and professionals lack easy access to mathematical resources. This lack of access is a barrier to higher education for many blind students and puts them at an unfair disadvantage in school, academia, and industry. A survey of current mathematical accessibility technologies for blind persons is covered in this paper, encompassing reading, wri...

متن کامل

Extraction of Logical Structure from Articles in Mathematics

We propose a mathematical knowledge browser which helps people to read mathematical documents. By the browser printed mathematical documents can be scanned and recognized by OCR (Optical Character Recognition). Then the meta-information (e.g. title, author) and the logical structure (e.g. section, theorem) of the documents are automatically extracted. The purpose of this paper is to show the ex...

متن کامل

Probabilistic Management of OCR Data using an RDBMS

The digitization of scanned forms and documents is changing the data sources that enterprises manage. To integrate these new data sources with enterprise data, the current state-of-the-art approach is to convert the images to ASCII text using optical character recognition (OCR) software and then to store the resulting ASCII text in a relational database. The OCR problem is challenging, and so t...

متن کامل

JBIG2 Supported by OCR

Digital Mathematical libraries contain a large volume of PDF documents containing scanned text. In this paper we describe how this documents can be compressed and thus provide them more effectively to the users. We introduce a JBIG2 standard for compressing bitonal images such as scanned text and we discuss issues if OCR is used for improving the compression ratio of jbig2enc open-source encode...

متن کامل

Accessibility Evaluation in Biometric Hybrid Architecture for Protecting Social Networks Using Colored Petri Nets

In the last few decades, technological progress has been made important information systems that require high security, Use safe and efficient methods for protecting their privacy. It is a major challenge to Protecting vital data and the ability to threaten attackers. And this has made it important and necessary to be sensitive to the authentication and identify of individuals in confidential n...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004